Skip to content

Flexible indexing blog post #795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Flexible indexing blog post #795

wants to merge 9 commits into from

Conversation

scottyhq
Copy link
Contributor

Supercedes #597

Copy link

vercel bot commented Aug 11, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Project Deployment Preview Comments Updated (UTC)
xarray-dev Ready Preview Comment Aug 13, 2025 10:01pm

Copy link

netlify bot commented Aug 11, 2025

Deploy Preview for xarraydev ready!

Name Link
🔨 Latest commit caf7663
🔍 Latest deploy log https://app.netlify.com/projects/xarraydev/deploys/689d0b073c63bc00080a3fee
😎 Deploy Preview https://deploy-preview-795--xarraydev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Member

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, Scott, that was a nice read.

I didn't check how this looks, but from the markdown at least this looks good to me in general. I do have a few comments on the details, though.

Comment on lines +26 to +35
// const bannerChildren = (
// <Link
// href='https://docs.google.com/forms/d/e/1FAIpQLSeGvTLONF-24V7z2HoACm4MhEr82c2V-VIzA9eqM9-jt-Xh8g/viewform?usp=sharing&ouid=111570313164368772519'
// fontSize='sm'
// >
// {' '}
// {/* Add your second link here, smaller font */}
// <b>SciPy 2025</b> Click here for info about an Xarray for Bio Sprint!
// </Link>
//)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove? Or did you want to leave that as an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go either way, I commented instead of deleting to more easily go back to the existing multi-line banner

</figcaption>
</figure>

{/* This is a comment that won't be rendered! */}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{/* This is a comment that won't be rendered! */}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll add a small section to the readme on authoring tips. I was commenting some things out while testing the build locally and had to search around to figure out this syntax! Eventually it would be nice for a more streamlined blogging interface, just using mystmd or jupyterbook, I'll open an issue with a couple ideas.

> In brief, an _index_ makes repeated selection of data more efficient. Xarray Indexes connect coordinate labels to associated data values and encode important contextual information about the coordinate space.
Examples of indexes are all around you and are a fundamental way to organize and simplify access to information.
If you want a book about Natural Sciences, you can go to your local library branch and head straight to section `500`, or if you're in the mood for a good novel go to section `800`. Connecting thematic labels with numbers is a classic indexing system that's been around for hundreds of years [(Dewey Decimal System, 1876)](https://en.wikipedia.org/wiki/Dewey_Decimal_Classification).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a US thing, I've never seen this kind of numbering scheme (although admittedly I've not been to a lot of libraries with science sections). Still, the example is interesting and I would keep it, I'd just qualify it with something like "In the US ..." (or whatever fits)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, definitely Dewey Decimal is a US thing, and there are plenty of alternatives :)

## Pandas.Index

Xarray's [label-based selection](https://docs.xarray.dev/en/latest/user-guide/indexing.html#indexing-with-dimension-names) allows a more expressive and simple syntax in which you don't have to think about the index (`da.sel(x=8) = 40`). Up until now, Xarray has relied exclusively on [Pandas.Index](https://pandas.pydata.org/docs/user_guide/indexing.html), which is still used by default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be closer to the import

Suggested change
Xarray's [label-based selection](https://docs.xarray.dev/en/latest/user-guide/indexing.html#indexing-with-dimension-names) allows a more expressive and simple syntax in which you don't have to think about the index (`da.sel(x=8) = 40`). Up until now, Xarray has relied exclusively on [Pandas.Index](https://pandas.pydata.org/docs/user_guide/indexing.html), which is still used by default:
Xarray's [label-based selection](https://docs.xarray.dev/en/latest/user-guide/indexing.html#indexing-with-dimension-names) allows a more expressive and simple syntax in which you don't have to think about the index (`da.sel(x=8) = 40`). Up until now, Xarray has relied exclusively on [pandas.Index](https://pandas.pydata.org/docs/user_guide/indexing.html), which is still used by default:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, should this mention databases and hash-tables? It doesn't have to, just a thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this mention databases and hash-tables? It doesn't have to, just a thought.

I'm actually not sure what you mean here? I thought pandas.Index is a hash-table? But my understanding of this space is admittedly not that great 😅 . If you have a sentence or two please suggest!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct. I think I remember hearing that hash tables were created for databases, so that's why I mentioned both (but either way, I don't think it's necessary to follow this, and I don't have any specific sentences in mind).


There are currently over 7000 commonly used [Coordinate Reference Systems (CRS)](https://spatialreference.org/ref/epsg/) for geospatial data in the authoritative EPSG database!
And of course an infinite number of custom-defined CRSs.
[xproj.CRSIndex](https://xproj.readthedocs.io/en/latest/) gives Xarray objects an automatic awareness of the coordinate reference system operations like `xr.align()`, which can raise an an informative error when there is a CRS mismatch:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[xproj.CRSIndex](https://xproj.readthedocs.io/en/latest/) gives Xarray objects an automatic awareness of the coordinate reference system operations like `xr.align()`, which can raise an an informative error when there is a CRS mismatch:
[xproj.CRSIndex](https://xproj.readthedocs.io/en/latest/) gives Xarray objects an automatic awareness of the coordinate reference system operations like `xr.align()`, which can raise an informative error when there is a CRS mismatch:

summary: 'An introduction to customizable coordinate-based data selection and alignment for more efficient handling of both traditional and more exotic data structures'
---

\_TLDR: Xarray>2025.6 has been through a major refactoring of its internals that makes coordinate-based data selection and alignment customizable, enabling more efficient handling of both traditional and more exotic data structures. In this post we highlight a few examples that take advantage of this new superpower!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TLDR looks a bit weird with the leading underscore after rendering, and I'd probably write the version comparison as xarray>2025.06.0. Is it truly > or rather >=, though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xarray>2025.06.0. Is it truly > or rather >= ?

While some pieces of this have been around longer, I know there have been a lot of incremental improvements, so I thought I'd play it safe and go very recent. @benbovy do you have a preference?

Copy link
Member

@keewis keewis Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the version itself is fine (but I didn't check for anything specific that we'd need), I just found the > a bit unusual, I would have used >= instead (and adjusted the version as necessary)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants